An Improvement Method for The Ambiguous Fragments Discovery in Chinese Word Segmentation

نویسندگان

  • Pieqian Liu
  • Jingjing Feng
چکیده

Disambiguation is a difficult task in Chinese Automatic Word Segmentation, and the ambiguous fragments discovery is the foundation of the disambiguation. This article proposes a method named Bidirectional Maximum Matching and Retroversion Multiword to discover the ambiguous fragments, which can deal with the overlapping ambiguity fragments of the long precision. Some experiments show that this method achieves better result than existing methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Approach to Word Segmentation and POS Tagging

In this paper, we present a hybrid method for word segmentation and POS tagging. The target languages are those in which word boundaries are ambiguous, such as Chinese and Japanese. In the method, word-based and character-based processing is combined, and word segmentation and POS tagging are conducted simultaneously. Experimental results on multiple corpora show that the integrated method has ...

متن کامل

Integrating Ngram Model and Case-based Learning for Chinese Word Segmentation

This paper presents our recent work for participation in the First International Chinese Word Segmentation Bakeoff (ICWSB-1). It is based on a generalpurpose ngram model for word segmentation and a case-based learning approach to disambiguation. This system excels in identifying in-vocabulary (IV) words, achieving a recall of around 96-98%. Here we present our strategies for language model trai...

متن کامل

A Chinese word segmentation based on language situation in processing ambiguous words

While the processing of natural language is beneficial to the text mining, Chinese word segmentation is an important step in the processing of Chinese natural language. In this paper, the convergence essence of the segmentation process is analyzed, and a theory of Chinese word segmentation based on language situation is deducted. Based on the segmentation theory, an algorithm of Chinese word se...

متن کامل

Chinese word segmentation and its effect on information retrieval

A set of IR experiments was carried out to study the impact of Chinese word segmentation and its effect on information retrieval (IR) at the Division of Information Studies, Nanyang Technological University, Singapore. A total of four automatic character-based segmentation approaches and a manual word segmentation approach was first carried out to obtain the word segments for indexing and to ev...

متن کامل

Exploiting unlabeled internal data in conditional random fields to reduce word segmentation errors for Chinese texts

The application of text-to-speech (TTS) conversion has become widely used in recent years. Chinese TTS faces several unique difficulties. The most critical is caused by the lack of word delimiters in written Chinese. This means that Chinese word segmentation (CWS) must be the first step in Chinese TTS. Unfortunately, due to the ambiguous nature of word boundaries in Chinese, even the best CWS s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010